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Abstract 

In this study, a compact finite-difference discretization is first developed for 
Helmholtz equations on rectangular domains. Special treatments, then, are introduced 
for Neumann and Neumann-Dirichlet boundary conditions to achieve accuracy and sep- 
arability. Finally, a Fast Fourier Transform (FFT) based technique is used to yield a 
fast direct solver. Analytical and experimental results show this newly proposed solver 
is comparable to the conventional second-order elliptic solver when accuracy is not a 
primary concern and is significantly faster than that of the conventional solver if a highly 
accurate solution is required. In addition, this newly proposed fourth order Helmholtz 
solver is parallel in nature. It is readily available for parallel and distributed comput- 
ers. The compact scheme introduced in this study is likely extendible for sixth-order 
accurate algorithms and for more general elliptic equations. 
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1 Introduction 


Obtaining a more accurate numerical solution, in general, means adding more discretization points 
and using smaller mesh sizes, which cost both computing time and storage space. The demand for 
more accurate solutions is a driving force for more powerful computers [15]. On the other hand, 
however, high performance computers require high-order accurate discretization methods to match 
their computation power and to explore the potential of high-performance computing. With the 
availability of high performance computers, the current barrier in utilizing existing hardware in 
many situations is the lack of high-order accurate numerical methods. In this study we introduce a 
high-order direct solver for Helmholtz equations with Neumann boundary conditions. This newly 
proposed solver achieves fourth-order accuracy with a computation count compatible with the best 
existing second-order algorithm. Of equal importance is its parallel nature and its readiness for 
parallel and distributed computers. 

Solving Helmholtz equations is a key issue of scientific computing. Intensive research has been 
done in recent years in the field to develop efficient numerical methods [16]. In the late sixties, 
Hockney [4] and Bunemann [1] developed fast direct methods for elliptic equations on rectangular 
uniform meshs. These methods take advantage of the special block structure of the resulting sys- 
tem of finite-difference discretizations and reduce the number of computations considerably. While 
these methods have been highly recognized for their practical importance, they are based on sec- 
ond order approximations. By adopting a novel finite-difference discretization, in 1979 Houstis and 
Papatheodorou [6] proposed a direct Helmholtz-Dirichlet solver with fourth-order accuracy. This 
fourth-order solver combines the techniques of Fourier transform and cyclic reduction and is as 
fast as the second-order solvers developed by Hockney and Bunemann. The main drawback of 
Houstis and Papatheodorou’s method is that it is designed for Dirichlet boundary conditions only. 
Although, analytically, solving a Neumann problem is equivalent to solving two Dirichlet problems 
for Helmhotz equations [7], due to the high transformation cost 1 Houstis and Papatheodorou's 
method is unacceptably slow for Helmholtz equations with Neumann or Neumann-Dirichlet condi- 
tions. 

Finite-element schemes provide another alternative for high-order discretizations. By using 
Rayleigh-Ritz-Galerkin approach with tensor product B-splines, Kaufman and Warner [8, 9] devel- 
oped a direct solver more recently for separable elliptic equations on rectangular domains. Their 
method permits high-order discretizations and works for both Dirichlet and Neumann boundary 
problems. While Kaufman and Warner’s method is a more general elliptic solver, it has a compu- 
tational complexity of 0(N 3 ) on a square domain of size N x N. 

In this paper, we propose a fourth-order direct solver for Helmholtz equations 

Pxx + Pyy - AP = R(x, y) in f), (1) 

1 As shown in Appendix B, the complexity of computing the transformation influence matrix is at least 
0(N 3 log o log 2 N) on a square domain of size N x N. 
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where Q is a rectangular computational domain with Neumann boundary conditions imposed on at 
least one dimension, A > 0, R is a given function, P is the scalar-valued function to be solved for. 
When A = 0, (1) becomes a Poisson equation. Mathematical analyses are conducted to prove our 
newly proposed solver is fourth order accurate and requires 7V 1 2 ( 5 log 2 iV-f 17) + 1197V operations on a 
TV x TV domain. These analytical results are confirmed by experimental measurements. Performance 
comparison has been made against the best second-order algorithm available. Analytical and 
experimental results show this newly proposed high-order solver is significantly faster than existing 
algorithms for Helmholtz equations with Neumann conditions, especially for large applications. 

The main mathematical tool used in this study is a newly-derived finite-difference discretization 
scheme, which is a generalization of the compact finite-difference scheme originally proposed by 
Kreiss and Oliger [10]. This paper is organized as follows. Compact schemes are presented in 
Section 2. A high order discretization is derived for Laplace operator. Our new discretization 
scheme for Helmhotz equations is introduced in Section 3. Based on the newly developed finite- 
difference scheme, a high-order fast solver is proposed in Section 4 for Helmholtz equations with 
Neumann conditions. The accuracy and efficiency of this algorithm are also analyzed. Extension 
of the fast solver to Neumann-Dirichlet conditions is discussed in Section 5. Finally, testing results 
are presented in Section 6. Section 7 gives the conclusion. To provide an appropriate comparison, 
accuracy analysis of the conventional second order solver and the influence matrix method are 
presented in Appendix A and B, respectively. A detailed error estimation of the newly proposed 
algorithm is conducted in Appendix C. 


2 Compact Difference Schemes 


Compact finite-difference schemes are derivative approximation methods that express a linear com- 
bination of derivatives in terms of a linear combination of function values [13]. As originally 
suggested by Kreiss and Oliger [10], and later discussed for fluid dynamics problems by Hirsh and 
Lele [3, 11], the first and second derivatives for compact differences can be approximated by 


and 


where 


fn = 


f" = 

J n 


Do 

1 + 1 h 2 D + D. 
D+D _ 

1 + ±h 2 D + D- 


- \ fn with error - ^/ (5) 


fn 


with error 


240 


/ (6) , 


Dofn — r^(/n+l — fn— l)i D+fn — ^(/n+1 /n): D-fn ^(/n 

and h is the mesh size. Multiply (2) and (3) by their respective denominators yields 


1 f t 2,1., _ fn+l fn - 1 

gTn-1 + 3/n + g/n-1 2 /l 


( 2 ) 

(3) 

(4) 
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and 


J _ rff , ® rtf j f ff _ /n+1 2/ n + fn-l 

12 /n_1 6 /n ^ 12 /n_1 “ h 2 


(5) 


As Hirsh has shown, the above two compact schemes yield a smaller difference stencil and higher 
accuracy for approximating /' and /" than the traditional fourth order central-difference. 

Compact scheme has traditionally been used only for approximating derivatives with known 
function values. For instance, compact schemes have often been used to compute one dimensional 
time derivatives in the iterative solving of differential equations where the function values of the 
last iteration are known. To approximate high-dimensional spatial differential operators, however, 
using 1-D compact schemes to approximate corresponding 1-D components of Partial Differential 
Equations (PDE) does not necessarily lead to a solution. For example, to discretize the Poisson 
equation 

Pxz Pyy = y) (6) 


if we use (5) to approximate P xx and P yy we will end up with 


_L 

12 J xx 


_i_ £ pij + 4_ _L pij- 1 _i_ I pij , _L pij+1 

T 6 xx “ i2 x xx “i2 yy ~ 6* yy ^ 12* yy 

= 4- pi+lj + pij+l 4. pij - 1 _ 4 pi, 3 } . 


(7) 


The left-hand side of (7) is not a linear combination of the Laplace operator A P = hence 

relation (6) cannot be used to reduce (7) into a solvable linear equation. Using compact schemes 
for the solution of spatial differential operators is still state-of-the-art and very challenging. Before 
deriving a high-order solution for Helmholtz equations, we first need to investigate the applicability 
of compact schemes for two dimensional Laplace operators. 

An example of compact finite-difference scheme for the 2-D Laplacian is given below when both 
x and y dimensions take the same uniform mesh size h: 


A P 1 ' 3 = 


^{D%D y + + D\Dl + D X _D\ + DlD y _) + D%D X + D y + D y _ 
1 -f ±h 2 (DiD x _ + D\D y _) 


pij 


where 


pxpij = l.[pi+lj _ pij^ J)*_ pij = ^(pij - P l ~ l ' 3 ). 

— l.(pt,j+l _ pij £)V_pij — ^(pij 

The truncation error of (8) is 


Error{AP ly i) = /i 4 


i a 6 a 6 i a 6 a 6 ' 

m a v o_4 a. .9 o_9 .4 ) a<?n V o~.fi o_.fi ) 


216 dx 4 dy 2 dx 2 dy 4 360 ax 6 dy e 

The difference stencils of (8) can be seen easily in the form of 


P hJ . 


( 


/_! _i _I 
/ 4 1 4 


i 1 H AP* J = h 


\ 


l 



( 8 ) 

(9) 


(10) 
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A more general form of the compact difference scheme for the Laplace operator is 


AP ij + a (AP i-1 ^ + A P i+l ' j + A + AP* ,J+1 ) 

+ 0 (AP i-1J ' _1 + AP i+1,J_1 + AP i-1J+1 + AP i+1J+1 ) 

= a p 

L p*-i,j-i + pt+i t i-i + pi-io+i_j_pt+i J+i_4pt,j 

+ b 2P ' 


( 11 ) 


The values of the parameters a. f>, and a, 0 can be derived by matching coefficients of the corre- 
sponding Taylor series of the Laplace operator for various orders of accuracy. The first unmatched 
coefficient determines the truncation error of the approximation (11). After some mathematical 
manipulation, direct matching leads to the following results: 


l+4a + 0 = a + b 


(second order accuracy), 


a + 2 0 =*$ 1 
2a + 4/3 = 5 J 


(fourth order accuracy). 


The highest order (11) can reach is 0(h 4 ), and (10) is the optimal one with three non-zero coeffi- 
cients, which is attained with 0 set to 0. 

Compared with conventional fourth order finite-difference 


A P l ' j 


4 pi-1.7+P'+ 1 .j + pi,J — l4.p»,j + l_4p».j 

3 ' h 7 

I P , - 2, >+P i+2 ' J +f > '^~ 2 +P l -- ,+2 — 4P*'-’ + o(h 4 ) 


though compact scheme (10) does not reduce the number of stencils, it has the advantage in 
handling grid points in the neighborhood of the boundaries (see [2] pp. 568). More importantly, 
its block tridiagonal structure makes fast direct solving possible. 

Discretization (10) is for Laplace operators. To apply compact schemes to Helmholtz equation 
(1), we need modify discretization (2) and (8) accordingly. Two basic finite-difference formulas are 
needed for the modification. They are: 


D 2 f n = D + ZA/ n + 0(h 2 ), 


( 12 ) 


and 

AP 1 ' 1 = {D x + D x _ + D y + D y _)P lJ + 0(h 2 ), (13) 

where D%, D x , £>+, and D y _, are defined in (9), D + and D_ in (4), and D 2 denotes the second 
derivative. 

Both (2) and (8) are of order four. Replacing D + D_ in the denominator of (2) by the second 
differential operator D 2 . and (D x + D x _ + D y + D y _) in the denominator of (8) by the Laplace operator 
A, yields 



and 


AP ,J = 


liDlDj + D%Dl + D^Dj + D*D y _) + B%B*_ + £>»£)» \ pi j 


1 + T2^ 2 A 


) 


The order of truncation error of the two modified compact schemes remain the same. Multiplying 
each equation with its corresponding denominator, the explicit forms of the above two equations 
are given by 

1 l ~ ... 

(14) 


^(/n+l - fn—l) = fn + \ U 2 /n" + 0{h 4 ) 


1 

2 h 


and 



Q J,2 

P* J = — -AP tJ - —A 2 P*’ J + 0(/i 4 ), 

2 8 


(15) 


respectively. 


3 High Order Discretization for Helmholtz Equations 

We assume equation (1) to be solved in a rectangular region [0, x\ x [0, y). This rectangular domain 
is partitioned into uniform mesh size h in both dimensions, yielding grid points 

(0,0), (fc,0), (Mh, 0) ; ... ; (0 ,Nk), ( h,Nh ), . . ., ( Mh,Nh ). (16) 


We apply the modified compact scheme (15) to discretize equation (1) with grid (16). Substitute 
A P ly i by A P l J + and A 2 P 2,J by A 2 P^ + A R 1 ^ 4- A based on the equivalences derived 
from equation (1), equation (15) becomes 



Move all the P 1 ^ terms on the right-hand side of (17) to the left, yielding 


-2 


l~\ 

-i 

U 


-i 

5 + Xh7 ~ l2 \h 2 

-1 


-i\ 

-l 

-V 


h 2 


P Uj = ~^R l j - — (\R i j + A R l A + 0(/i 4 ). 

I 8 \ * 


(18) 


The approximation of A R in the above equation only needs to be second order accurate for the 
final solution P to remain fourth order accurate. 

To provide (18) with an adequately accurate boundary treatment necessitates a fourth-order 
first derivative approximation for the boundary conditions. As stated in Section 1, equation (1) 
has a Neumann-Neumann or Neumann-Dirichlet boundary conditions. Since Neumann-Neumann 
boundary problems and Neumann-Dirichlet problems share many common characteristics, we dis- 
cuss the solution of the Neumann-Neumann boundary problem in detail in this section and, then, 
extend the solution to Neumann-Dirichlet boundary problems in Section 5. 
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General Neumann-Neumann boundary conditions can be expressed mathematically as 


=- = b{x,y) ondn, (19) 

an 

where dtt is the boundary of the rectangular domain 0, n is the normal vector of the boundary of 
the computational domain, and ^ indicates the derivative normal to the boundary. 

Applying equation (14) at grid point (0, j), we have 


h ' 2 „ . pb j 

_ P°iJ 4. pOj _ £ 

6 xxx x 2h 


F _lj j 

- + 0(/t 4 ). 


Incorporating this result into boundary condition (19) yields 



pi.j _ p-ij 

2 h 


0(h 4 ). 


( 20 ) 


Difficulty exists in using Equation (20), however. The boundary function b(x,y) is usually only 
available at the boundaries, but not in a small neighborhood of the boundaries. Thus, the derivative 
of function b{x,y) can be found only along the y-direction but not the x-direction for grid points 
(0, j). To overcome this difficulty, we reformulate derivative b°’J by incorporating information from 
the equation (1), 


b°J 

U XX 


= ^ = l (AF 0J - P,V) - - ^P y V 

- + AP ° J ) ~ W P * 3 =R°x } + -b° y l 


Therefore, we have transformed (20) into 


pij _ p-i-j 


2 h 


= J (X, - 4') + (1 + ^)» 0J + Ol'* 4 )- 


( 21 ) 


The approximation of b°’J in the above equation only needs to be second order for (21) to remain 
fourth order, and it is feasible to carry the y-direction derivative approximation at the boundary 
of x-direction. 

The value P -1 >j in (21) which lies outside the computational domain is eliminated when both 
equations (21) and (18) are applied to grid points (0, j). Through the above discretization and 
derivation, the Helmholtz equation (1) and the boundary conditions (19) now can be incorporated 
into the following linear system: 

A P = -~h 2 R - ^ (XR + A R) + 2hU + 0(h b ) . (22) 

2 8 

The vector U results from the boundary conditions and vanishes at interior points. It has different 
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forms along the four boundaries and at the four corners and is given by 


U°' j = 
U m ’i = 


_3 P°£ _ xl2 (P°' j ~ 1 +4P°' j +P°- j + 1 ) _ L 2 (fix J ~ 1 +4flg- J -ffig j+1 ) 

2 24 n 24 ' 

3 Pp J + A ^2 (P J m ' J - 1 +4Pr j + Px mJ+1 ) + ^2 (Pr ,;, ~ 1 +4Pr J +flr' J + 1 ) 


CP’° = 
17* ,n = 


3Pr _ A ^2 (Pr - +4P’- u +P’ +1 ’ U ) _ ^2 (flj~ i ’ 0 +4Rj°+Ri" i ~ 1 ' 0 ) 




24 

i- 1 ,n +4 p ^n +p ^ 1 ,n 


24 


24 




l +p; 


>i+l,Ti\ 


24 


(23) 

(24) 


for 1 < z < m — 1, 1 < j < n — 1, and 


C/0’0 = _(i + 4£) ( 5 (p0.0 + P®’®) + pO,l + pbO) _ h 2 ° 


+ 




24 


rrm,0 / 1 i 

u ~ U ^ 24 


I + W) ( 5 ( p i m, ° - ^T’ 0 ) + p ^’ 1 - ^ m ~ 1,0 ) + ft 2 - ■ " ■■■ ' ~ R 

5( p m ,0 _pm 1 °)_i 2 ( p ma_pm- 1 , 0 ) +g ( p m,2_ p m-2/)_2( p m, 3 _ p m>3j 

24 


-iff’ )+**“*% 


24 




C/O’" = -(i + (5(P0’" - P°’ n ) + PO’"- 1 - P^ 1 ’") - 




24 


+ 


5(Pg ,n ~ Pg n ) — 12(Pj <n ~ 1 — Pj ,w )4*9(Pj >n ~ ^ — Py ' n ) — 2(Fg ,n ~ 3 — pg ,n ^ 


24 


[/m T n = (I + Aj£) (5(P^ n + P™’ n ) + P^” 1 + P™" 1 ^) + /l 2 ^ 


,m,n +p m,n) + ^ 


24 


>777 * 71 ” 1 . n?Tl“ l,tl 

jJhL 


5 ( p m,n + pm,n)_ 12 ( p m,n-l +p ^ 


m ,n — 1 . rjm — l,n 


)+9(P 3 


m.n — 2 | p m — 2,n 


t )~2(P i 


iTn 'Ti — 3 | p m — 3.n 


24 


The matrix A is an (M + 1)(N + 1) by (M + l)(iV + 1) matrix given by 


/ A\ — 2x4 2 0-0 0 0 \ 

-A 2 A\ -A 2 -000 

0 0 0 - — A 2 A\ — A 2 

Vo 0 0-0 -2A 2 A x / 


(25) 


with matrix Ai of order ( N + 1) by ( N + 1) being 


( d - 2 0 

-1 d -1 


Ai = 


I 0 0 0 

\ 0 0 0 


0 0 0 \ 
0 0 0 

-1 d -1 
0-2 d ) 
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where d — 5 + A/l2 g +12 Ah 2 , and with matrix A 2 of order (N + 1) by (N + 1) being 

/ 1 i 0 0 0 0 

I 1 I ... 0 0 0 

a 2 

0 0 0 ■ •• i 1 i 

Vo 0 0 ... 0 \ 1 

The regular structure of matrix A allows it to be tridiagonalized by Fast Cosine Transform (FCT) 

A = FAF -1 , (26) 

where A is the resulting matrix, and F is an ( M + l)(Af + 1) by (M 4- 1 ){N + 1) block diagonal 
matrix with (N + 1) by (N + 1) diagonal submatrix Fi given by 

/II 1 1 

1 COS ( ^ ) cos (jf) cos (j/ r ) 

Fi = 1 cos(%) cos(^) cos(^) 

Vl cos(^) cos(^) cos( yp 

After tridiagonalization, equation (22) becomes 

h _2 FAF -1 P = B , (27) 

where B = -\R — ^ (A R + A R) + 2h _1 17. Finally, the solution is 

P = h 2 FA -1 F -1 £?. (28) 




4 The Fourth-Order Fast Solver 

Based on the mathematical study given in the last section, a fourth-order algorithm can be carried 
out in the following steps: 

1) Compute the cosine values to be used in the FCT. 

2) Compute the value of each entry in matrix A. 

3) Compute the vector U in equation (22). 

4) Compute the right-hand side of (27). 

This is done by adding the results from step 1) to — — yy (A R -I- A R) which is to be 

computed in this step. 
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5) Apply inverse FCT to the right-hand side of equation (27). 

This is done by multiplying matrix Ff 1 to each block Bi in the matrix £?, for i = 0,1, ..., M, 
where B x = ( , bn, . . . , i>i 7 v) T is the i-th N-component-long block of vector B. 

6) Solve the tridiagonal system AY = F ~ l B for Y, where Y — F ~ l P. 

Because of the particular structure of matrix A, we get N independent tridiagonal systems, 
each of size M. Reassemble the right-hand side F -1 2? according to the structure of A, and 
then solve the N tridiagonal systems. 

7) Apply FCT to vector Y computed in step 6) to recover P. 

This is done by multiplying matrix Fi to each of vector Y’s N-component-long block Y{ for 
i = 0, 1, M. 

The operation count of each step of the fourth-order algorithm on a square domain of N x N , from 
steps 1 to 7, is: 

1. N multiplications and 0.5iV additions; 

2. 6 N multiplications and 2 N additions; 

3. 36iV multiplications and 48 jV additions; 

4. N 2 + ION multiplications and 4A r2 + 12 N additions; 

5. N 2 log 2 N + N(\og 2 N — 4) multiplications and A7 2 (1.5 log 2 N + 1.5) — N log 2 N additions; 

6. 5 N 2 + 6^ multiplications and 3 N 2 + 4 N additions; 

7. N 2 log 2 N + N(log 2 N - 3) multiplications and A r2 (1.51og 2 N + 2.5) - ATlog 2 N additions. 
The total operation count of the algorithm is the sum of the work of the seven steps, which is 

N 2 (6 + 2 log 2 N) + N( 2 log 2 N -F 52) multiplications and 

A r2 (31og 2 N + 11) + A^(~21og 2 N + 67) additions (29) 

= 7V 2 (51og 2 JV + 17) + 119AT. 

It is N 2 + 45A r multiplications and 4A^ 2 + 53JV additions more than the operation count of the 
conventional second-order fast Poisson solver (see (40) in Appendix A [4]). 

Execution time in general is approximately proportional to the number of operations. To better 
understand the time-accuracy efficiency, the newly-proposed high-order direct solver, the conven- 
tional second order fast Poisson method [4], and the influence matrix method [7] are compared 
analytically in terms of operation counts, assuming all the three solvers are solving the same prob- 
lem and to meet the same accuracy requirement. 

The influence matrix method (see Appendix B) can be combined with the direct fourth-order 
Dirichlet solver [6] to solve Equation (1) with Neumann conditions. The analysis given in Appendix 
B shows that the influence matrix method has the same order of truncation error as our proposed 
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high-order solution. The final solution error depends on both the truncation error and the matrix 
of the discretized linear system, but the matrices of the two methods have eigenvalues with the 
same properties where solution errors are concerned. Therefore it is appropriate to conclude that 
the influence matrix method and the newly proposed high-order solver have the same accuracy. 
The operation count of the influence matrix method given by (44) is at least N times the operation 
count of our direct Neumann solver for problems of partition size no larger than 2 20 x 2 20 . So we 
can conclude that our method is approximately N times faster than the influence matrix method. 

The comparison of the conventional fast Fourier method for Poisson equations needs more delib- 
eration. For the sake of brevity, we restrict our discussion on the unit square domain [0, 1] x [0, 1]. 
With slight modifications, the same analysis can be conducted for general rectangular domains. 
We introduce the following notations: E{Mthd) denotes the difference between the exact solution 
and the numerical solution computed by method Mthd , e > 0 is the required accuracy, i.e. the 
difference between the computed numerical solution and the exact solution must be less than or 
equal to e. With these notations, now the error of our fourth-order method can be denoted by 
E {order A), and the error of the second-order Poisson solver will be E {order 2). By analysis given 
in Appendix C for the high order direct method, the solution error in general satisfies 

E{orderA) = a ■ h 4 for some problem dependent coefficient a. 

The error estimation for the conventional second order solution given in appendix A indicates 
that in general 

E {order 2) = b ■ h 2 for some problem dependent coefficient b. 

To meet the accuracy requirement for both the order 2 fast Poisson solver and our order 4 
method, the two solvers need to take different mesh sizes and partition sizes, say partition size N 
and mesh size h for our direct Neumann solution, and partition size iV 2 and mesh size h 2 for the 
order 2 method. Then E < £ , implies that 

a ■ h 4 = e for the 4th order direct Neumann solver 

and 

b • h 2 2 = £ for the second order method. 

Therefore 

a-h A = b-h. \ (30) 


Since 

h — 1/jV and h 2 = 1 / A 2 . 


(30) is equivalent to 


a 


b 



A relation between N and N 2 is 


N 2 = 



= CAT 2 , 


(31) 


where 



Thus, if our fourth order solver can satisfy the error tolerance by taking a partition of size N, then 
it requires the second order solver to take a partition size of CN 2 to achieve the same accuracy. 

The parameter C in general could vary largely from problem to problem. For a Poisson equation 
with polynomial solution of degree four, the new fourth-order method gives the exact solution while 
the second order fast solver does not (see Table 5). In these cases, C is close to infinity, which means 
the second order solver will never get the same accuracy as our method no matter what partition 
size it takes. For Poisson equations which have only twice differentiable solutions, the number C 
will be close to jr, which means the two methods have almost the same accuracy when taking the 
same partition size. Fortunately, for equations having at least fifth derivatives, the variation range 
of the coefficient C is not very large. 

Obviously C depends on the matrices of the two discretized systems for the two methods and 
also on the truncation errors. Since the eigenvalues of the two methods are almost the same, 
we can mainly use their truncation errors to compare the solution errors of the two methods. 
From the analysis given in appendix A and appendix C, we can see that the truncation errors are 
influenced by many factors such as the coefficients, the derivatives of the solution of the first five 
orders, the number of terms, and the number A in the equation (1). So the parameter C is heavily 
problem dependent. But the highest order terms of truncation error for both the fourth and second 
order solution clearly occur at the four corners of the computational domain. For problems whose 
derivatives of different orders differ within a factor of 100, and the coefficient A is not greater than 
400, substituting the ratio of a and b in (32) by the ratio of the truncation errors of the fourth and 
second solutions at the four corners, we estimate that C will fall into the interval of [yg, 8j. For 
“well behaved” equations, e.g. the derivatives of different orders differ within a factor of 5 and A 
is moderately small (see the examples given in Tables 1 to 5 and in Tables 7 to 10), the parameter 
C assumes a range of [4, 1]. 

Let T4 and T? denote the time needed by the order 4 and order 2 methods respectively to solve 
a problem within a given error tolerance. According to equations (40), (29), and (31), 


C 2 N \ 5 log 2 CN 2 + 12) + 21C 2 JV 
AT 2 (51og 2 iV + 17) + 119.7V 


(33) 


For equations where C ranges from \ to 1, the time ratio T 2 : T x will fall between jqN 2 and 2 N 2 
by formula (33). In other words, the traditional order 2 solver will take roughly j^N 2 to 2 N 2 times 
more computation than that of the newly-proposed fourth order solver to reach the same accuracy. 


11 



The difference is huge. The speed of the newly proposed algorithm is unmatched by that of other 
existing direct solvers when a fourth order accurate solution is wanted. 


5 The Modified Fast Solver for Neumann-Dirichlet Conditions 

In this section we discuss the fourth order solution of problems with Neumann boundary conditions 
imposed on y-dimension. and Dirichlet conditions on x-dimension. By symmetry, this solution can 
be extended to problems with Neumann boundary conditions imposed on x-dimension, and Dirichlet 
conditions on y-dimension as well. These boundary conditions can be expressed mathematically 

as: 

I dP^ y) _ k( x? y) along y-dimension boundaries ^4) 

\ P{x, y) — f(x, y) along x-dimension boundaries 

For these Neumann-Dirichlet problems, we apply the compact scheme (15) at interior grid-points of 
the problem domain, yielding (18). Similarly, we apply the 1-D compact scheme (14) to discretize 
the boundary conditions of v-dimensional boundaries. 

The discretized equation system now has the form 

A P = - -h 2 R - ^ (A R + A R) + U X + 2hU y + 0(/i 5 ), (35) 

2 8 

where U y , which depends on the Neumann conditions at the y-dimensional boundaries, is defined 
the same way as that of (23) and (24). U x depends on the Dirichlet conditions at the x-dimensional 
boundaries and is given by 

Tj\,j _ ipOj-l 1 pOJ _|_ lpOj+l 

rrm-l.j — lpmj-1 1 pm,j , lpmj + 1 
O x — '4 7 

for j = 1 , 2. .. . , n — 1 , and 

pl,0 _pO.O + lpO,l 5 Ijm- 1,0 = pm,0 + lpm,l ) 

jj l,n _ pO.n + Ip0,n-1_ \jm- l,n _ pm, n + lpm, n-1 

Equation (35) can be reduced to a sequence of independent tridiagonal systems either by Fast 
Cosine Transform (FCT) or by Fast Sine Transform (FST). Which transformation to use is a choice 
of efficiency. When N > M. FCT will lead to a better efficiency. Otherwise M < N, FST will be 
a good choice. If FCT is the choice, the vector P then is stored in the form of 

P — {Pi, o? Pi.i' > Pi.n ' ’ Pm- i.o’ Pm- i.i* ’ Pm- i.a). 
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Matrix A in equation (35), then, is an (M - l)(N + 1) by (M — 1 )(N 4- 1) matrix given by 


( Ai -A 2 0 

-A 2 A\ ~a 2 


A = 


0 

0 


0 

0 


0 \ 

0 


0 

V 0 


0 

0 


0 

0 


—A 2 Ai —A 2 
0 — A 2 Ai ) 


with A\, A 2 being the same as that in (25). The FCT used here is the same as (26) except that 
now F is of order of (M — 1)(JV + 1) by (M — 1 )(N + 1). 

The FST is given by 

A = SAS -1 , 

where S is an ( N + 1 )(M — 1) by ( N + 1 )(M — 1) block diagonal matrix with diagonal submatrix 


- 1) being 




sMji) 

••• sm ( L itr L ) \ 

sMjj) 

sin(j |) 

... 

sin{^j) 


... 


«n(2^) 

... 


Si = 


To apply FST, we need to rearrange equation (35) in such a way that the vector P is stored in the 
form of 

P ~ (Pi.o’ P2,oi ' ‘ ' » Pm- i,o’ ' ‘ ‘ ■ Pi.ni P2,ni * ’ ‘ > Pm-\,n). 

After the rearrangement, the matrix A in equation (35) takes the same form as equation (25) but 
with different submatrices, where matrix Ai is an (N + 1) by (N + 1) Teoplitz tridiagonal matrix, 


A 1 = 


A 2 = 


d 

-1 0 ••• 

0 

0 

0 \ 

-1 

d -1 ••• 

0 

0 

0 

0 

0 0 ••• 

-1 

d 

-1 

0 

0 0 ••• 

0 

-1 

d ) 

i -2 is also an (IV + 1) by (IV + 1) Tec 

( 1 

? 0 ••• 

0 

0 

°\ 

l 

4 

1 1 ... 

0 

0 

0 

0 

0 0 ••• 

1 

4 

1 

1 

4 

0 

0 0 ••• 

0 

1 

4 

1 / 


By equation (35). the solution algorithm for Neumann- Dir ichlet problems consists of the fol- 
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lowing steps: 

0) Determine whether FCT or FST should be used and store the data accordingly. 

1) Compute the cosine or sine values to be used in the FCT or FST. 

2) Compute the values of each entry in matrix A. 

3) Compute the vector U x and JJ y in equation (35). 

4) Compute the right-hand side of equation (35). 

5) Apply the inverse of FCT or FST to the right-hand side of equation (35). 

6) Solve the sequence of tridiagonal systems that resulted from the fast transform. 

7) Apply FCT or FST to the solution of each tridiagonal system computed in step 6) to recover P. 

Step 0) takes no computation. For equations on a square domain with partition size N x N, the 
operation count of this algorithm is almost the same as that of the Neumann-Neumann boundary 
problem. Its operation count is approximately 

N 2 { 2 log, N + 6) multiplications and N 2 (3 log 2 N + 11) additions 
= TV 2 (5 log, N 4- 17) operations. 


6 Experiment Results 

Theoretical analyses given in Appendix C show that the newly proposed Helmholtz solver is ap- 
proximately fourth order accurate and is highly efficient. By definition, a numerical method is 
fourth order accurate if and only if when the number of grid points doubles the discretization error 
will decrease at a rate of . Five Helmholtz equations with known exact solutions have been 
chosen as test problems to verify the analytical results and to illustrate the performance gain of the 
high order method. Among the five test problems, two have polynomial solutions, with one having 
a fractional degree. The other three are with solution of sine-exponential function, polynomial- 
cosine function, and two dimensional cosine function, respectively. They represent a large class of 
practical Helmholtz equations. Experimental tests have been conducted on a DEC Station 5000 
to measure the numerical results and execution time. As listed in Tables 1 to 10, experimental 
results match analytical results closely. Measured performance confirms that the newly proposed 
algorithm is highly accurate and efficient. 

Two performance metrics are used in the measurement. The metric Order, defined by 

MaxE(n) 

Order (n, n+1) = log 2 — — , 
v ' MaxE(n + 1) 


measures the order of accuracy of numerical solutions. The term Relative error is defined as 


Relative error = 


\\P-P\U 

mil 
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Table 1 

. Solving P xx + Pyy - 

HP = R on Unit Square with P 

= e x sin(y) 


Fourth Order 

N = 

8 

16 

32 

64 

128 

256 

512 

Maximal error 

1.10E-04 

7.30E-06 

4.67E-07 

2.95E-08 

1.85E-09 

1.16E-10 

9.40E-12 

Relative error 

4.22E-05 

2.57E-06 

1.58E-07 

9.81E-09 

6.11E-10 

3.80E-11 

3.22E-12 

Order 


3.9 

4.0 

4.0 

4.0 

4.0 

3.6 

Time(seconds) 

0.004 

0.016 

0.066 

0.27 

1.20 

5.3 

24 

Second Order 

N = 

8 

16 

32 

64 

128 

256 

512 

Maximal error 

1.94E-03 

4.94E-04 

1.24E-04 

3.10E-05 

7.76E-06 

1.94E-06 

4.85E-07 

Relative error 

7.22E-04 

1.69E-04 

4.06E-05 

9.93E-06 

2.46E-06 

6.11E-07 

1.52E-07 

Order 


2.0 

2.0 

2.0 

2.0 

2.0 

2.0 

Time(seconds) 

0.004 

0.012 

0.066 

0.26 

1.13 

5.2 

24 


Table 2. Solving P xx + P yy = R on Unit Square with P — 

cos(xy ) 


Fourth Order 

N = 

8 

16 

32 

64 

128 

256 

512 

Maximal error 

5.26E-05 

2.90E-06 

1.66E-07 

9.88E-09 

5.99E-10 

3.70E-11 

1.49E-12 

Relative error 

9.32E-06 

5.00E-07 

2.95E-08 

1.79E-09 

1.11E-10 

6.91E-12 

1.71E-13 

Order 


4.2 

4.1 

4.1 

4.0 

4.0 

4.7 

Time(seconds) 

0.004 

0.012 

0.062 

0.28 

1.20 

5.18 

23 

Second Order 

N = 

8 

16 

32 

64 

128 

256 

512 

Maximal error 

8.15E-04 

2.01E-04 

5.00E-05 

1.25E-05 

3.12E-06 

7.80E-07 

1.95E-07 

Relative error 

1.11E-04 

2.53E-05 

6.05E-06 

1.48E-06 

3.67E-07 

9.12E-08 

2.27E-08 

Order 


2.0 

2.0 

2.0 

2.0 

2.0 

2.0 

Time(seconds) 

0.004 

0.016 

0.062 

0.28 

1.13 

5.04 

23 


where P and P denote the exact solution and computed solution respectively, and || • ||i is the l\ 
norm. All the testing is conducted on the unit square domain [0, 1] x [0, 1] with the same uniform 
mesh size h on each dimension. N — \/h is the number of grid points on each x- and y-dimension. 

Tables 1 to 4 present the time-accurate comparison between the new fourth-order fast solver and 
the traditional second-order fast solver (see Appendix A), for the first four testing problems with 
Neumann-Neumann boundary conditions. Measured experimental results show our new method 
is indeed fourth order accurate and achieves the high order accurate solution without increasing 
execution time, as compared with the conventional second-order fast solver. 

Table 5 lists measured experimental results for a special Poisson equation whose solution is a 
polynomial of degree four. The truncation error given in Appendix C has only derivatives of fifth 
order or higher when A = 0, which implies that our high order solver for the Poisson equation 
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Table 3. Solving P xx + P yy - P = R on Unit Square with P = (x 3 - x 2 )cosy 


Fourth Order 


N - 

8 

16 

32 

64 

128 

256 

512 

Maximal error 

7.73E-05 

4.90E-06 

3.07E-07 

1.92E-08 

1.20E-09 

7.50E-11 

4.64E-12 

Relative error 

5.72E-04 

3.12E-05 

1.83E-06 

1.10E-07 

6.79E-09 

4.21E-10 

2.63E-11 

Order 


4.0 

4.0 

4.0 

4.0 

4.0 

4.0 

Time(seconds) 

0.008 

0.016 

0.086 

0.269 

1.24 

5.30 

24 

Second Order 

N = 

8 

16 

32 

64 

128 

256 

512 

Maximal error 

6.86E-03 

1.72E-03 

4.31E-04 

1.08E-04 

2.69E-05 

6.73E-06 

1.68E-06 

Relative error 

5.46E-02 

1.21E-02 

2.83E-03 

6.85E-04 

1.68E-04 

4.18E-05 

1.04E-05 

Order 


2.0 

2.0 

2.0 

2.0 

2.0 

2.0 

Time(seconds) 

0.004 

0.012 

0.059 

0.262 

1.14 

5.02 

24 


will give an exact solution for polynomials of degree four or lower. This implication is verified by 
the testing results. The table shows that the fourth-order direct Neumann solver gives the exact 
solution while the second order method cannot reach high accuracy even with enlarged problem 
size and with extended execution time. 

Table 6 compares the measured execution time of the conventional second-order faster Poisson 
solver and the newly-proposed fourth-order solver. The first four testing problems are solved by 
the fourth-order algorithm. Then, the same problems are solved with the conventional second 
order method to match the achieved accuracy with increased number of grid points and execution 
time. The execution times of the fourth order algorithm and the second order algorithm are listed 
side-by-side in Table 6 for each of the testing problems. Table 6 shows that the new method is 300 
to 1500 times faster, as indicated by the column of time ratio for the two solvers. Notice that the 
performance gain increases largely when the problem size increase with the problem domain. This 
time ratio increase is no surprise. It is around N 2 / 4 and is well predicted by the range [^JV 2 ,2iV 2 ] 
given in Section 4. The new algorithm is well suitable for scalable computing where problem size 
increases with the computational power. 

Tables 7 to 10 list the experimental results of the first four testing problems with correspond- 
ing Neumann-Dirichlet boundary conditions. As confirmed by the measured results, solutions of 
Neumann-Dirichlet problems are also of fourth order accurate. In addition, they even have a smaller 
error than that of the Neumann-Neumann boundary problem, since there is no discretization error 
arising along x-dimensional Dirichlet boundary conditions. This feature is also well matched by 
our experiment results. 
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Table 4. Solving P xx + P yy — 24.75P — R on Unit Square with P = x 55 + y 5 * 5 


Fourth Order 


N = 

16 

32 

64 

128 

256 

512 

1024 

Maximal error 

4.75E-04 

3.46E-05 

2.32E-06 

1.50E-07 

9.55E-09 

6.01E-10 

3.84E-11 

Relative error 

3.45E-04 

2.50E-05 

1.67E-06 

1.07E-07 

6.81E-09 

4.28E-10 

2.67E-11 

Order 


3.8 

3.9 

4.0 

4.0 

4.0 

4.0 

Time(seconds) 

0.016 

0.059 

0.285 

1.24 

5.30 

23 

102 

Second Order 

N = 

16 

32 

64 

128 

256 

512 

1024 

Maximal error 

1.82E-02 

4.61E-03 

1.15E-03 

2.89E-04 

7.22E-05 

1.81E-05 

4.51E-06 

Relative error 

8.34E-03 

2.09E-03 

5.20E-04 

1.30E-04 

3.24E-05 

8.09E-06 

2.02E-06 

Order 


2.0 

2.0 

2.0 

2.0 

2.0 

2.0 

Time (seconds) 

0.016 

0.055 

0.281 

1.13 

5.05 

25 

97 


Table 5. Solving P xx + Py y = jR on Unit Square with P = x 4 -f y 4 


Method 

Fourth Order 

Second Order 

Second Order 

N = 

8 

256 

1024 

Maximal error 

6.66E-16 

3.05E-05 

1.91E-06 

Relative error 

1.09E-15 

2.53E-05 

1.59E-06 

Time(seconds) 

0.004 

5.04 

100 


7 Conclusion 

Solving Helmholtz equations is a fundamental problem of scientific computing. Since Hockney 
first proposed the so-called “fast Poisson solver” in 1965, intensive research has been done in the 
field to develop fast direct solvers. However, while significant progress has been made during the 
years, fourth-order fast solvers are only currently available for Helmholtz equations with Dirichlet 
boundary conditions. Based on a novel compact finite-difference scheme, a fourth-order fast di- 
rect solver is proposed for Helmholtz equations with Neumann-Neumann and Neumann-Dirichlet 
boundary conditions on a rectangular domain in this study. Accuracy and efficiency of the fourth 
order algorithm are carefully examined. Theoretical and experimental results show that the newly 
proposed algorithm is highly efficient. It can obtain fourth order solution as fast as the conventional 
method achieving second order solutions, and could be thousands of times faster than that of the 
conventional method if accurate solutions are required. 

The search for a better parallel solver is the original motivation behind this research. The 
proposed algorithm is parallel in nature. Although parallel implementation is not presented in this 
study, since the algorithm has a similar data structure as that of the conventional direct method, its 
parallelization is straight forward and high speedup is expected [13, 14]. The same uniform mesh 
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Table 6. Computation time of the two methods for the same accuracy 


Problem 

Method 

N 

Maximal error 

Time (seconds) 

Time ratio 

i 

Order 4 

32 

4.67E-07 

0.066 



Order 2 

512 

4.85E-07 

24 

364 

i 

Order 4 

64 

2.95E-08 

0.27 



Order 2 

2048 

3.03E-08 

423 

1570 

2 

Order 4 

32 

1.66E-07 

0.062 



Order 2 

512 

1.95E-07 

24 

389 

2 

Order 4 

64 

9.88E-09 

0.28 



Order 2 

2048 

1.22E-08 

423 

1510 

3 

Order 4 

16 

4.90E-06 

0.016 



Order 2 

256 

6.73E-06 

5.02 

314 

3 

Order 4 

32 

3.07E-07 

0.062 



Order 2 

1024 

4.21E-07 

97 

1560 

4 

Order 4 

32 

3.46E-05 

0.059 



Order 2 

512 

1.81E-05 

25 

424 

4 

Order 4 

64 

2.32E-06 

0.285 



Order 2 

2048 

1.13E-06 

422 

1480 


Table 7. Neumann-Dirichlet problem: P zx + P yy - HP = R on unit square with P = e x sin{y) 


By FCT 


N = 

8 

16 

32 

64 

128 

256 

512 

Maximal error 

6.16E-05 

4.02E-06 

2.54E-07 

1.59E-08 

9.97E-10 

6.25E-11 

5.17E-12 

Relative error 

2.28E-05 

1.21E-06 

6.83E-08 

4.04E-09 

2.46E-10 

1.51E-11 

1.65E-12 

Order 


3.9 

4.0 

4.0 

4.0 

4.0 

3.6 

Time(seconds) 

0.004 

0.016 

0.066 

0.27 

1.20 

5.3 

24 


size is used in both x- and y-dimension in our discussion. The restriction of the same uniform mesh 
size is for the sack of brevity. Different uniform mesh sizes can be applied to x- and y-dimension 
respectively. Similar to the conventional second order solver, Fourier transform is used in the newly 
introduced method to hasten computation. As observed by Hockney, the Fourier transform based 
approach is a special case of the FACR(Z) method with Z = 0 [5]. By combining Z steps of cyclic 
reduction with the newly proposed algorithm, an FACR(/)-like algorithm could be formed to further 
improve current results for optimal solution. Also, with appropriate modifications, the fourth order 
algorithm is likely extendible to six order solutions and to more general boundary and domain 
conditions. Many issues have opened for future work. 
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Table 8. Neumann-Dirichlet problem: P xx + P yy — R on unit square with P = cos(xy) 


By FCT 


N = 

8 

16 

32 

64 

128 

256 

512 

Maximal error 

1.08E-05 

7.59E-07 

4.97E-08 

3.17E-09 

2.00E-10 

1.26E-11 

5.39E-13 

Relative error 

2.10E-06 

1.10E-07 

6.34E-09 

3.85E-10 

2.38E-11 

1.50E-12 

7.38E-14 

Order 


3.8 

3.9 

4.0 

4.0 

4.0 

4.5 

Time(seconds) 

0.004 

0.019 

0.066 

0.26 

1.23 

6.1 

23.7 


Table 9. Neumann-Dirichlet problem: Pxx + Pyy P = R on unit square with P — (x 3 — x 2 )cosy 


By FST 


N = 

4 

8 

16 

32 

64 

128 

256 

Maximal error 

6.98E-05 

5.98E-06 

4.16E-07 

2.72E-08 

1.73E-09 

1.09E-10 

6.89E-12 

Relative error 

2.23E-04 

1.34E-05 

8.25E-07 

5.20E-08 

3.30E-09 

2.08E-10 

1.40E-11 

Order 


3.5 

3.8 

3.9 

4.0 

4.0 

4.0 

Time (seconds) 

0.001 

0.008 

0.016 

0.086 

0.269 

1.24 

5.30 


Appendices 


A Direct Second Order Solution 

Using traditional finite-difference scheme, the second order approximation of the Laplacian is given 
by 


( 1 


hr 


1 -4 1 | P 1 ' 3 = + 0(h 2 ), 

V 1 

and the discretization of the Neumann boundary conditions is given by 


(36) 


p-l J _ pl,j 

2 h 


= P^PO(h 2 ). 


(37) 


Table 10. Neumann-Dirichlet problem: P xx + Pyy — 24.75 P = R on unit square with P = x 5 ' 5 + y 5,5 


By FST 


N = 

8 

16 

32 

64 

128 

256 

512 

Maximal error 

2.54E-03 

2.26E-04 

1.65E-05 

1.10E-06 

7.13E-08 

4.53E-09 

2.85E-10 

Relative error 

2.02E-03 

1.46E-04 

9.46E-06 

5.96E-07 

3.73E-08 

2.33E-09 

1.44E-10 

Order 


3.5 

3.8 

3.9 

3.9 

4.0 

4.0 

Time(seconds) 

0.004 

0.016 

0.059 

0.285 

1.24 

5.3 

22 
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Applying (36) and (37) to discretize equation (1) and boundary condition (19) respectively, yielding 


D P = -h 2 R + 2hU + 0(h 3 ) , 


(38) 


where the vector U stores the linear combination of the function values of b(x,y) in (19) at different 
boundary points, and matrix D has the same structure as matrix A given by (25) except that the 
submatrix A 2 is the N x N identity matrix, and A x is given by 


Ai = 


4 

+ Xh 2 

-2 

0 • 

0 

0 

° \ 


-1 

4 + Xh 2 

-1 • 

0 

0 

0 


0 

0 

0 - 

-1 

4 + Xh 2 

-1 


0 

0 

0 • 

0 

-2 

4 + Xh 2 / 


The truncation error of (38) is 


h* 

( d 4 

+ 

d 4 

12 

V dx A 

dy* 

h 4 

j d 4 

1 

d 4 

12 

\dx* 

T 

dy* 

h 4 

( d 4 

+ 

d 4 

12 

\ dx* 

dy* 


phj 


for interior point (i , ])■ 
for boundary point (0 ,j). 


p °,0 _ h? p0,0 f or boundary point (0, 0). 


Since the solution error depends not only on the truncation error, but also on the matrix of the 
discretized system, a study of matrix D is needed. 

Matrix D has the same eigenvectors as that of matrix A in equation (25). Eigenvalues of matrix 
D are 

d kJ = 4 + Xh 2 - 2cos{-^) - 2cos(-^) , (39) 

for k = 0, 1, ..., M and 1 = 0,1 , .... N. Compared with the eigenvalues of matrix A given in (46), we 
can see that 


dk.l = ^kd - 1 


Xh 2 4- 4 


, kn 


In , 


Xh 2 - cos{-—)cos(—) < X kj i . 


8 ' M N 

\ f 

Like the eigenvalues of matrix A. d k j are all positive, and satisfy the monotone property 


d k - 1 ,( < dk,i, and d k ,i- 1 < d k j, 

and d 0 .o € 0(h 2 ) and d k ,i € 0(1) for (k, l ) such that k > $ or l > A. Following a similar analysis to 
that of the fourth order method given in appendix C, we can conclude that the solution error of the 
second order method ranges from O(h) to 0(h 3 ) when considering all possible extreme situations, 
and in general, will be around 0(h 2 ). 

The conventional second-order direct solver, which is first proposed by Hockney [4] and modified 
and extended by many since then [12], is a Fourier transform based algorithm. It goes through the 
same seven steps as the fourth order solution listed in Section 4, but with different computations in 
steps 2), 3) and 4) due to the difference in discretization. The conventional second order method 
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has 3 TV multiplications and TV additions for step 2), 47V multiplications and 47V additions for step 
3), and 47V additions for step 4). Therefore, the total operation count of the second order fast 
solver is 

TV 2 (2 log 2 TV + 5) 4- 7V(2 log 2 TV 4- 7) multiplications and 

7V 2 (3 log 2 TV + 7) + 7V(— 2 log 2 TV 4- 14) additions (40) 

= TV 2 (51og 2 TV + 12) -f 217V operations. 

B The Influence Matrix Method 

A Poisson-Neumann problem can be solved by solving two Poisson- Dir ichlet problems through the 
influence matrix method. Here we briefly describe the influence matrix method based on [7]. 

To solve the Neumann problem (1) with boundary condition ^ = 0, a sequence of solutions to 
the following problem is first determined: 


A q { - \q l = 0 
Q l = Sij 


(41) 


for each discrete boundary grid point xj. So there are 47V equations on a square of partition size 
TV x TV. The Dirac delta function 8 hJ is defined as 8 — 1 for i = j and 8 ^ = 0 for i ^ j. Upon 
computation of the vectors of normal gradients at all the boundary points, these vectors are 
then stored in columns to yield a matrix Ijvf that is referred to as the influence matrix. 

The Neumann problem then is equivalent to the following solution of two Dirichlet problems. 
First, solve 

APi — XPi — R in n 

p n ( 42 ) 

P\ — 0 on oil . 

Again, compute the gradients normal to the boundary and store them in vector G. Then, solve 


AP 2 “ AP 2 =0 in 

(43) 

p 2 = 1“^ G on dQ . 

The final solution that satisfies the original equation (1) and boundary condition (19) is Pi — P 2 . 

The influence matrix method can be combined with the direct fourth-order Dirichlet solver [6] 
to solve Equation (1) with Neumann conditions. Solving (41) and (42) by the fourth order method 
of [6] results in two linear systems with truncation error of order 0(/i 6 ), with a same symmetric 
matrix with eigenvectors 

n / qA:,/ nh y l qAt,/ Qk,l ok,l ^k,l \ ^ 

dk,l — ^ 1 , 2 ’ •'*’ 5 ••• 5 ’ 

where S^j = .sm(~^) sirt(^r) with respective eigenvalues Xk.i being the same as the eigenvalues 
given by (46, for k — 1,2 , M — 1, and l = 1,2, ..., N — 1. 

Note that the eigenvalues A k,i are the same as those of the proposed high order direct solver 
except for k = 0, M or / = 0, N. We also have that is of 0(h 2 ), and is of 0(1). So 
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based on the error analysis given in appendix C, solving (41) and (42) results in solutions with error 
ranging from 0{h 4 ) to 0{h 6 ). Suppose they achieve 0(h 6 ). Then after computing the influence 
matrix I \- /• and the vector G . the truncation error of (43) increases to at least 0{h 5 ). because first 
order differentiation reduces the order by one. Also the discrete system of (43) has the same matrix 
as (41) and (42), which in the sense of h norm is not better than matrix A of (22). 

The operation count of the influence matrix method is mainly due to the computation of the 
influence matrix, which requires solving a sequence of AN Helmholtz-Dirichlet equations given by 
(41) on a square domain of size N x N. To solve each equation in the sequence (41) using cyclic 
reduction and Fourier analysis based method has an asymptotic operation count of 3iV 2 log 2 (log 2 N) 
multiplications and 3N 2 log 2 (log 2 N) additions as given in the review paper [13]. Therefore the 
computation count of the influence matrix is 

12V 3 log 2 (log 2 N) multiplications and 12V 3 log 2 (log 2 N) additions. (44) 


C Error Estimation of the Fourth-Order Neumann Solver 


In this section, we give the error analysis of the newly proposed high-order compact finite-difference 
discretization. First we derive the truncation error of the discretized linear system, and then an 
eigenvalue analysis of the matrix is presented, and finally we give a global solution error estimation 
by using the eigenvalue properties of the matrix and the truncation error derived. 

The truncation error T x of the boundary condition (21) at the boundaries of x-dimension when 
all differential operators replaced by their respective discrete counterparts is: 


T x {0,j) 

T_m.m 


— 120 ftr 5 
_ h' d'° nO.O 


>5 n n 1*4 a3 / * — n n *. 1 1 rfi »— *1 


The truncation error T y of the boundary condition at the boundaries of y-dimension has similar 
formula as T x , and is therefore omitted. 

To eliminate outside point (-1,-1) of (18) at point (0,0), both the x- and y- dimensional 
boundary conditions are needed. And the truncation error of both the x- and y- dimensional 
boundary condition at point (0. 0) is given by 


T™(0,0) = 


h 4 

120 


(m + m ) +tf(& + ^)( a + a ) + + p0. 


After substituting the Laplacian in (18) by its discrete version, the truncation error for (18) is 
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given by 
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So the truncation error of (22) when all differential operators replaced by their respective discrete 


versions is 


T(i,j) 
T(0,j) 
T( 0,0) 


h 2 n(i,j), 

h 2 Ti(0,j) - 2hT x (0,j) - § (T x (0,j - 1)+T x (0,j + 1)) , 

h 2 T x (0, 0) - 2 ft (T x ( 0, 0) + T y (0, 0)) f (T x ( 0, 1) + T y (1, 0) + T xy ( 0, 0)) , 
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The solution error E , i.e. the difference between the exact solution and the computed solution, 
is not only dependent on the truncation error T but also on the matrix A. In fact, 


E — A _1 T. 


(45) 


To see how the matrix A influences the solution error E , we look at its eigenvectors and eigenvalues 
first. 

Matrix A has eigenvectors 

T rt ( T rk,l irk, l . , t rk,l irk, l irk, l \ ^ 

'kj-\ v 0,0> v 0,l> *•* i V 0,N ■ "*5 v Mfl' y M,V'"> v M,N) ’ 
where V^f = coa(^)cos(^), with respective eigenvalues 


. f shir . _ Jn . kn . An. 

X ki i = d~ 2 cos(-) - 2co.s( — ) - cos( — )cos( — ) 


( 46 ) 


for k = 0,1, M, and l = 0, 1, N, and d = 5 + ^tiZAft 2 . 
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Matrix A -1 has the same eigenvectors as A with corresponding eigenvalues 1/A k i- Since there 
are ( M + 1 )(N + 1) distinct eigenvalues in the (M + 1 ){N + 1) dimensional discrete space, the 
( M + l)(N + 1) eigenvectors are linear independent and thus span the ( M + 1 ){N + 1) dimensional 
vector space in which we are solving the equation. Therefore the truncation error T can be expended 
in terms of the spanning eigenvectors V kl normalized from for k = 0, 1, M and l = 0,1, N 

35 M N 

T = Y,T, c ki v ki ■ ( 4? ) 

*•=0 1=0 


Thus by (45) 


\\Eh = ||A -1 T||2 


M N 



EE 


cl , 


(48) 


Since in any finite dimensional space l 2 norm || • || 2 and infinity norm || • || are equivalent, we use 
(48) to estimate the error of the proposed high order direct solution. Since the truncation error 
T is of 0(h 5 ). (47) means that Y.k,iCkiVki is of 0{h 5 ). The eigenvalues of A are all positive and 
satisfy 

< A k,i- and A^-i < A 


Therefore 


\ 


M N s~i2 

^ - < l|E|] 2 < 


EE*. 

k - 0 /=0 




M N 

EE^ 1 - 

fc=0 1=0 A o,o 


But A 0 ,o is of 0(h 2 ), A 0 ^ and A^ 0 are of 0{h), and \ k l is of 0(1), for all ( k , /) pairs such that 
k > fjj or / > which implies that the order of E ranges from 0(h 3 ) to 0(h 5 ). For the solution 
to be of 0(h 3 ). the truncation error T must concentrate on the near-zero low frequence (i.e. all 
coefficients C k j are almost equal to 0 for k, l not near zero) in the expansion (47). Assuming 
uniform distribution for the coefficients C k /s, the probability of E being 0{h :i ) is close to zero 
when M and N are large. In general, this high order method is approximately fourth order. 
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